Skip to content

gpu offload host code generation #142097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
Draft

Conversation

ZuseZ4
Copy link
Member

@ZuseZ4 ZuseZ4 commented Jun 5, 2025

r? ghost

This will generate most of the host side code to use llvm's offload feature.
The first PR will only handle automatic mem-transfers to and from the device.
So if a user calls a kernel, we will copy inputs back and forth, but we won't do the actual kernel launch.
Befure merging, we will use LLVM debug infrastructure to verify that the memcopies match what openmp offloa generates in C++.

A follow-up PR will generate the actual device-side kernel which will then do computations on the GPU.
A third PR will implement manual host2device and device2host functionality, but the goal is to minimize cases where a user has to overwrite our default handling due to performance issues.

I'm trying to get a full MVP out first, so this just recognizes GPU functions based on magic names. The final frontend will obviously move this over to use proper macros, like I'm already doing it for the autodiff work.
This work will also be compatible with std::autodiff, so one can differentiate GPU kernels.

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2025
@ZuseZ4 ZuseZ4 added F-gpu_offload `#![feature(gpu_offload)]` and removed A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2025
@rust-log-analyzer

This comment has been minimized.

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2025
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer
Copy link
Collaborator

The job aarch64-gnu-llvm-19-2 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
Diff in /checkout/compiler/rustc_session/src/config.rs:3068:
     };
 
     use super::{
-        AutoDiff, Offload, BranchProtection, CFGuard, CFProtection, CollapseMacroDebuginfo, CoverageOptions,
+        AutoDiff, BranchProtection, CFGuard, CFProtection, CollapseMacroDebuginfo, CoverageOptions,
         CrateType, DebugInfo, DebugInfoCompression, ErrorOutputType, FmtDebug, FunctionReturn,
         InliningThreshold, InstrumentCoverage, InstrumentXRay, LinkerPluginLto, LocationDetail,
-        LtoCli, MirStripDebugInfo, NextSolverConfig, OomStrategy, OptLevel, OutFileName,
+        LtoCli, MirStripDebugInfo, NextSolverConfig, Offload, OomStrategy, OptLevel, OutFileName,
         OutputType, OutputTypes, PatchableFunctionEntry, Polonius, RemapPathScopeComponents,
         ResolveDocLinks, SourceFileHashAlgorithm, SplitDwarfKind, SwitchWithOptPath,
         SymbolManglingVersion, WasiExecModel,
Diff in /checkout/compiler/rustc_codegen_llvm/src/declare.rs:99:
             )
         }
     }
-
 }
 
 impl<'ll, 'tcx> CodegenCx<'ll, 'tcx> {
Diff in /checkout/compiler/rustc_codegen_llvm/src/common.rs:377:
     }
 }
 
-pub(crate) fn named_struct<'ll>(
-    ty: &'ll Type,
-    elts: &[&'ll Value],
-) -> &'ll Value {
+pub(crate) fn named_struct<'ll>(ty: &'ll Type, elts: &[&'ll Value]) -> &'ll Value {
     let len = c_uint::try_from(elts.len()).expect("LLVMConstStructInContext elements len overflow");
     unsafe { llvm::LLVMConstNamedStruct(ty, elts.as_ptr(), len) }
 }
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/lto.rs:6:
 use std::sync::Arc;
 use std::{io, iter, slice};
 
+use llvm::Linkage::*;
 use object::read::archive::ArchiveFile;
 use rustc_codegen_ssa::back::lto::{LtoModuleCodegen, SerializedModule, ThinModule, ThinShared};
 use rustc_codegen_ssa::back::symbol_export;
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/lto.rs:22:
 use rustc_session::config::{self, CrateType, Lto};
 use tracing::{debug, info};
 
-use llvm::Linkage::*;
-
 use crate::back::write::{
     self, CodegenDiagnosticsStage, DiagnosticHandlers, bitcode_section_name, save_temp_bitcode,
 };
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/lto.rs:31:
     DynamicLinkingWithLTO, LlvmError, LtoBitcodeFromRlib, LtoDisallowed, LtoDylib, LtoProcMacro,
 };
 use crate::llvm::AttributePlace::Function;
-use crate::llvm::{self, build_string, Linkage};
+use crate::llvm::{self, Linkage, build_string};
 use crate::{LlvmCodegenBackend, ModuleLlvm, SimpleCx, attributes};
 
 /// We keep track of the computed LTO cache keys from the previous
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/lto.rs:655:
     // We then run the llvm_optimize function a second time, to optimize the code which we generated
     // in the enzyme differentiation pass.
     let enable_ad = config.autodiff.contains(&config::AutoDiff::Enable);
-    let enable_gpu = true;//config.offload.contains(&config::Offload::Enable);
+    let enable_gpu = true; //config.offload.contains(&config::Offload::Enable);
     let stage = if thin {
         write::AutodiffStage::PreAD
     } else {
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/lto.rs:699:
             //   void *AuxAddr;
             // } __tgt_offload_entry;
             let entry_elements = vec![ti64, ti16, ti16, ti32, tptr, tptr, ti64, ti64, tptr];
-            let kernel_elements = vec![ti32, ti32, tptr, tptr, tptr, tptr, tptr, tptr, ti64, ti64, tarr, tarr, ti32];
+            let kernel_elements =
+                vec![ti32, ti32, tptr, tptr, tptr, tptr, tptr, tptr, ti64, ti64, tarr, tarr, ti32];
 
             cx.set_struct_body(offload_entry_ty, &entry_elements, false);
             cx.set_struct_body(kernel_arguments_ty, &kernel_elements, false);
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/lto.rs:706:
             let global = cx.declare_global("my_struct_global", offload_entry_ty);
             let global = cx.declare_global("my_struct_global2", kernel_arguments_ty);
-//@my_struct_global = external global %struct.__tgt_offload_entry
-//@my_struct_global2 = external global %struct.__tgt_kernel_arguments
+            //@my_struct_global = external global %struct.__tgt_offload_entry
+            //@my_struct_global2 = external global %struct.__tgt_kernel_arguments
             dbg!(&offload_entry_ty);
             dbg!(&kernel_arguments_ty);
             //LLVMTypeRef elements[9] = {i64Ty, i16Ty, i16Ty, i32Ty, ptrTy, ptrTy, i64Ty, i64Ty, ptrTy};
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/lto.rs:717:
                     continue;
                 }
 
-                fn add_priv_unnamed_arr<'ll>(cx: &SimpleCx<'ll>, name: &str, vals: &[u64]) -> &'ll llvm::Value{
+                fn add_priv_unnamed_arr<'ll>(
+                    cx: &SimpleCx<'ll>,
+                    name: &str,
+                    vals: &[u64],
+                ) -> &'ll llvm::Value {
                     let ti64 = cx.type_i64();
                     let size_ty = cx.type_array(ti64, vals.len() as u64);
                     let mut size_val = Vec::with_capacity(vals.len());
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/lto.rs:728:
                     add_unnamed_global(cx, name, initializer, PrivateLinkage)
                 }
 
-                fn add_global<'ll>(cx: &SimpleCx<'ll>, name: &str, initializer: &'ll llvm::Value, l: Linkage) -> &'ll llvm::Value {
+                fn add_global<'ll>(
+                    cx: &SimpleCx<'ll>,
+                    name: &str,
+                    initializer: &'ll llvm::Value,
+                    l: Linkage,
+                ) -> &'ll llvm::Value {
                     let c_name = CString::new(name).unwrap();
-                    let llglobal: &'ll llvm::Value = llvm::add_global(cx.llmod, cx.val_ty(initializer), &c_name);
+                    let llglobal: &'ll llvm::Value =
+                        llvm::add_global(cx.llmod, cx.val_ty(initializer), &c_name);
                     llvm::set_global_constant(llglobal, true);
                     llvm::set_linkage(llglobal, l);
                     llvm::set_initializer(llglobal, initializer);
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/lto.rs:737:
                     llglobal
                 }
 
-                fn add_unnamed_global<'ll>(cx: &SimpleCx<'ll>, name: &str, initializer: &'ll llvm::Value, l: Linkage) -> &'ll llvm::Value {
+                fn add_unnamed_global<'ll>(
+                    cx: &SimpleCx<'ll>,
+                    name: &str,
+                    initializer: &'ll llvm::Value,
+                    l: Linkage,
+                ) -> &'ll llvm::Value {
                     let llglobal = add_global(cx, name, initializer, l);
-                    unsafe {llvm::LLVMSetUnnamedAddress(llglobal, llvm::UnnamedAddr::Global)};
+                    unsafe { llvm::LLVMSetUnnamedAddress(llglobal, llvm::UnnamedAddr::Global) };
                     llglobal
                 }
 
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/lto.rs:746:
                 // We add a pair of sizes and maptypes per offloadable function.
                 // @.offload_maptypes = private unnamed_addr constant [4 x i64] [i64 800, i64 544, i64 547, i64 544]
-                let o_sizes = add_priv_unnamed_arr(&cx, &format!(".offload_sizes.{num}"), &vec![8u64,0,16,0]);
-                let o_types = add_priv_unnamed_arr(&cx, &format!(".offload_maptypes.{num}"), &vec![800u64, 544, 547, 544]);
+                let o_sizes = add_priv_unnamed_arr(
+                    &cx,
+                    &format!(".offload_sizes.{num}"),
+                    &vec![8u64, 0, 16, 0],
+                );
+                let o_types = add_priv_unnamed_arr(
+                    &cx,
+                    &format!(".offload_maptypes.{num}"),
+                    &vec![800u64, 544, 547, 544],
+                );
                 // TODO: We should add another pair per call to offloadable functions
                 // @.offload_sizes.5 = private unnamed_addr constant [2 x i64] [i64 16384, i64 16384]
                 // @.offload_maptypes.6 = private unnamed_addr constant [2 x i64] [i64 1, i64 3]
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/lto.rs:770:
                 let c_section_name = CString::new(".llvm.rodata.offloading").unwrap();
                 llvm::set_section(llglobal, &c_section_name);
 
-
                 // New, TODO: cleanup
                 let name = format!(".offloading.entry.kernel_{num}");
                 let ci64_0 = cx.get_const_i64(0);
Diff in /checkout/compiler/rustc_codegen_llvm/src/back/lto.rs:777:
                 let ci16_1 = cx.get_const_i16(1);
-                let elems: Vec<&llvm::Value> = vec![ci64_0, ci16_1, ci16_1, cx.get_const_i32(0), region_id, llglobal, ci64_0, ci64_0, cx.const_null(cx.type_ptr())];
+                let elems: Vec<&llvm::Value> = vec![
+                    ci64_0,
+                    ci16_1,
+                    ci16_1,
+                    cx.get_const_i32(0),
+                    region_id,
+                    llglobal,
+                    ci64_0,
+                    ci64_0,
+                    cx.const_null(cx.type_ptr()),
+                ];
 
                 let initializer = crate::common::named_struct(offload_entry_ty, &elems);
                 let c_name = CString::new(name).unwrap();
fmt: checked 6049 files
Build completed unsuccessfully in 0:00:39
  local time: Sun Jun  8 01:08:48 UTC 2025
  network time: Sun, 08 Jun 2025 01:08:48 GMT
##[error]Process completed with exit code 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. F-gpu_offload `#![feature(gpu_offload)]` T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants